Cardiovascular disease is a condition that occurs when blood vessels clot in the heart, leading to heart attacks and even death in certain severe cases. The World Health Organisation estimated around 17 million people losing their lives because of this cardiovascular disease. There have been significant advances in vessel analysis in the past few decades, including identifying and treating diseases of internal organs and reducing heart disease mortality. MRI, computerized tomography (CT), and ultrasound are several more popular medical imaging techniques that provide advanced qualitative and quantitative assessments of internal organ anatomical structures and functions in determining diagnosis, disease observation, and treatment decisions. Using the manual approach of analyzing the volume of chambers could be prone to subjective error that costs a life. To reduce deaths due to CVD it is necessary to have an early diagnosis. There have been advancements in image segmentation techniques like MRI ( magnetic resonance imaging) , CT scans ( Computed Tomography) and ultrasound. These non-invasive techniques provide good images/scans for early prognosis. Image segmentation can be employed to do various quantitive measurements like the volume of the left ventricle LV, the right ventricle RV, and the mass of the myocardium MY. This project aims at image segmentation of a cardiovascular Magnetic resonance using advanced image segmentation models in Deep learning. Heart image segmentation typically involves the left atrium, right atrium, left ventricle, and coronary arteries. The data for this project was taken from the ACDC1 challenge and further pre-processed by the module lead of the neural computing coursework. It contains 200 Cardiovascular images with the truth mask in the png format. The data split into 50% for training, 10% for validation, and 40% for testing. For the training set, there are 100 CMR images and 100 respective ground truth mask images. Cuda is one of the parallel computing platforms and programming models used to speed up the compute intensity. Using PyTorch Cuda a track of the currently selected GPU, and all CUDA tensors allocated will by default be created on the selected device. An open-source computer vision library was used to import the dataset, convert it to grayscale, and plot the image data using matplotlib python library.
The ACDC1 challenge presented us with 100 training images with corresponding masks. The data provided is small which restricted us to use a simple model for image segmentation. Upon conducting research survey, it was observed that UNET are popular models being used for Biomedical Image Segmentation [1] Additionally, UNET is a simple model which can be adapted for our Image Segmentation task at hand.
The model is divided into 2 legs which are downsampling and upsampling. This approach first deconstructs the image then reconstructs the essential parts of the image to give the desired output.
Initially we began with a basic UNet model adaptation:
The model consists of 4 downsampling blocks followed by 3 blocks for upsampling. Each sequential block has a Conv2d, Relu activation and another Conv2d followed by Relu activation. The input is passed to a Preconvolution layer which changes the image size to 3,96,96 and then feeds it into the model. The model is adapted accordingly to output the predicted mask which has a size of 4,96,96, where 4 represents the different classes. [15]
The implementation is as follows:
#UNET_Modified1
from torchsummary import summary
import torch
import torch.nn as nn
import torch.nn.functional as F
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('device :',device)
def double_conv(in_channels, out_channels, mid_channels = None):
if not mid_channels:
mid_channels = out_channels
return nn.Sequential(
nn.Conv2d(in_channels, mid_channels, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(mid_channels, out_channels, kernel_size=3, padding=1),
nn.ReLU(inplace=True)
)
class UNet_Modified(nn.Module):
def __init__(self, n_class=4):
super().__init__()
self.dconv_pre = double_conv(1,3)
self.dconv_down1 = double_conv(3, 64)
self.dconv_down2 = double_conv(64, 128)
self.dconv_down3 = double_conv(128, 256)
self.dconv_down4 = double_conv(256, 512)
# self.dconv_down5 = double_conv(512, 1024)
self.maxpool = nn.MaxPool2d(2)
self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
self.dconv_up3 = double_conv(256 + 512, 256)
self.dconv_up2 = double_conv(128 + 256, 128)
self.dconv_up1 = double_conv(128 + 64, 64)
self.conv_last = nn.Conv2d(64, n_class, 1)
def forward(self, x):
convpre = self.dconv_pre(x) #not maxpooling here
conv1 = self.dconv_down1(convpre)
x = self.maxpool(conv1)
conv2 = self.dconv_down2(x)
x = self.maxpool(conv2)
conv3 = self.dconv_down3(x)
x = self.maxpool(conv3)
x = self.dconv_down4(x)
x = self.upsample(x)
x = torch.cat([x, conv3], dim=1)
x = self.dconv_up3(x)
x = self.upsample(x)
x = torch.cat([x, conv2], dim=1)
x = self.dconv_up2(x)
x = self.upsample(x)
x = torch.cat([x, conv1], dim=1)
x = self.dconv_up1(x)
out = self.conv_last(x)
return out
#https://github.com/usuyama/pytorch-unet/blob/master/pytorch_unet.py
model_UNET1 = UNet_Modified() # We can now create a model using your defined segmentation model
model_UNET1 = model_UNET1.to(device)
print(model_UNET1)
summary(model_UNET1,input_size=(1,96,96),batch_size=4)
Upon experimentation, it was observed that addition of BatchNormalisation [16] between the Conv2d layers and Activation layers increases the stability of the model by increasing the hidden layers. The initial model above is modified and renamed as UNET_Modified2. Also, we have removed the convpre layer which was unneccessary, we pass our input (1,96,96) directly to the constructed model and downscale it gradually as per our needs.
The UNET_Modified2 implementation is as follows:
#UNET_Modified2
from torchsummary import summary
import torch
import torch.nn as nn
import torch.nn.functional as F
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('device :',device)
def double_conv(in_channels, out_channels, mid_channels = None):
if not mid_channels:
mid_channels = out_channels
return nn.Sequential(
nn.Conv2d(in_channels, mid_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(mid_channels),
nn.ReLU(inplace=True),
nn.Conv2d(mid_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
)
class UNet_Modified2(nn.Module):
def __init__(self, n_class=4):
super().__init__()
self.dconv_down1 = double_conv(1, 64, 3)
self.dconv_down2 = double_conv(64, 128)
self.dconv_down3 = double_conv(128, 256)
self.dconv_down4 = double_conv(256, 512)
#self.dconv_down5 = double_conv(512, 1024)
self.maxpool = nn.MaxPool2d(2)
self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
self.dconv_up3 = double_conv(256 + 512, 256)
self.dconv_up2 = double_conv(128 + 256, 128)
self.dconv_up1 = double_conv(128 + 64, 64)
self.conv_last = nn.Conv2d(64, n_class, 1)
def forward(self, x):
conv1 = self.dconv_down1(x)
x = self.maxpool(conv1)
conv2 = self.dconv_down2(x)
x = self.maxpool(conv2)
conv3 = self.dconv_down3(x)
x = self.maxpool(conv3)
x = self.dconv_down4(x)
x = self.upsample(x)
x = torch.cat([x, conv3], dim=1)
x = self.dconv_up3(x)
x = self.upsample(x)
x = torch.cat([x, conv2], dim=1)
x = self.dconv_up2(x)
x = self.upsample(x)
x = torch.cat([x, conv1], dim=1)
x = self.dconv_up1(x)
out = self.conv_last(x)
return out
#https://github.com/usuyama/pytorch-unet/blob/master/pytorch_unet.py
model_UNET2 = UNet_Modified2() # We can now create a model using your defined segmentation model
model_UNET2 = model_UNET2.to(device)
print(model_UNET2)
summary(model_UNET2,input_size=(1,96,96),batch_size=4)
We explored the transfer learning methods to improve the prediction score. Transfer learning [17] is using pretrained models with their respective weights trained on large datasets and then train it on our model. We use models with U-Net architecture as the base model. We use a third-party python library - Segmentation Models [2] that implements Segmentation models with pre-trained backbones in PyTorch. The library provides us with a wide range of state-of-the-art Model architectures such as Deeplab, UNet, FPN, etc. along with a large number of pre-trained encoders with weights from different datasets/training methods like imagenet, imagenet+background, imagenet+5k etc. Picking a pre-trained model is as simple as calling a function with the model architecture, encoder weights, number of input channels, and number of classes. The returned pre-trained model accepts inputs of size [N,C,H,W] and gives an output mask of size [Class,H,W]. We now load this model in training mode and run our training on this model, refining the weights to solve our particular problem. The advantage of Transfer Learning implemented above is that the model training starts with known low-level feature extractors which have been trained on huge datasets and hence we have a tremendous headstart over the models being trained from scratch, as a result, the training converges faster and uses lesser epochs. The library also provides implementations of a variety of loss functions such as DiceLoss [4] and SoftCrossEntropyLoss.
2.2.1 EfficientNet B4 architecture [11]:
Image source: https://towardsdatascience.com/complete-architectural-details-of-all-efficientnet-models-5fd5b736142
There are five types of modules used to construct the seven blocks as shown in the figure below:
The Efficient model B4 architecture is as follows:
#Run this to install, import and use Pytorch default Segmentation models
!pip install segmentation_models_pytorch
!pip install albumentations
The Efficient-Net B4 model is as follows:
model_efficientnet_b4 = smp.Unet(
encoder_name="efficientnet-b4", # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
encoder_weights="imagenet", # use `imagenet` pre-trained weights for encoder initialization
in_channels=1, # model input channels (1 for gray-scale images, 3 for RGB, etc.)
classes=4, # model output channels (number of classes in your dataset)
)
model_efficientnet_b4
2.2.2 Modification of Dataloaders for Data Augmentation. We modify the given data loaders to better suit our use cases. Specifically, for
Train DataSet : We change the given training dataset class to incorporate a custom third-party data augmentation library called Albumentations [5]. We use some basic image transformations from this library such as rotate, flip, shift, and scale to augment our images and corresponding masks. We write a composite transform including all these transformations. Using Image augmentation allows us to expand our dataset and better generalize on our training dataset. One other change made in the Train DataSet is the addition of an extra dimension to the images and masks since the old dataloader gives images of shape : (H x W) and not (C x H x W) i.e is the standard usage (also specified in the question, where C=1 since greyscale).
ValDataset: Since we need a Data Loader for loading the Validation Set, and since the training DataSet now includes augmentations/transformations, we write a new DataSet class for Validation testing. This also includes the dimension expansion as in the previous class.
Test DataSet: This is almost the same as the one given except for the dimension expansion change similar to the other two. We also add an extra return variable 'img_path' in the getitem function to return the current image path to simplify the submission process.
from torch.utils.data import DataLoader
import torch,numpy as np
import torch.utils.data as data
import cv2
import os
from glob import glob
import albumentations as A
transform = A.Compose([
A.ShiftScaleRotate(shift_limit=0.2, scale_limit=0.2, rotate_limit=30, p=0.5,border_mode=1),
A.HorizontalFlip(p=0.5),
A.VerticalFlip(p=0.5),
A.RandomRotate90(p=0.5),
])
class TrainDatasetModified(data.Dataset):
def __init__(self, root=''):
super(TrainDatasetModified, self).__init__()
self.img_files = glob(os.path.join(root,'image','*.png'))
self.mask_files = []
for img_path in self.img_files:
basename = os.path.basename(img_path)
self.mask_files.append(os.path.join(root,'mask',basename[:-4]+'_mask.png'))
def __getitem__(self, index):
global transform
img_path = self.img_files[index]
mask_path = self.mask_files[index]
data = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
label = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)
transformed = transform(image=data,mask=label)
return torch.from_numpy(np.expand_dims(transformed['image'],axis=0)).float(), torch.from_numpy(transformed['mask']).float()
def __len__(self):
return len(self.img_files)
class ValDatasetModified(data.Dataset):
def __init__(self, root=''):
super(ValDatasetModified, self).__init__()
self.img_files = glob(os.path.join(root,'image','*.png'))
self.mask_files = []
for img_path in self.img_files:
basename = os.path.basename(img_path)
self.mask_files.append(os.path.join(root,'mask',basename[:-4]+'_mask.png'))
def __getitem__(self, index):
global transform
img_path = self.img_files[index]
mask_path = self.mask_files[index]
data = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
label = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)
return torch.from_numpy(np.expand_dims(data,axis=0)).float(), torch.from_numpy(label).float()
def __len__(self):
return len(self.img_files)
class TestDatasetModified(data.Dataset):
def __init__(self, root=''):
super(TestDatasetModified, self).__init__()
self.img_files = glob(os.path.join(root,'image','*.png'))
def __getitem__(self, index):
img_path = self.img_files[index]
data = np.expand_dims(cv2.imread(img_path, cv2.IMREAD_UNCHANGED), axis=0)
return torch.from_numpy(data).float() , img_path
def __len__(self):
return len(self.img_files)
# The following snippet helps in mounting the google drive for data set
# show image mask function is used for visulising images
#data loaders for loading the train , val and test datasets
from google.colab import drive
import torch
import os
import cv2 #import OpenCVfrom matplotlib import pyplot as plt
import matplotlib.pyplot as plt
from glob import glob
import torch.utils.data as data
drive.mount('/content/drive')
#This is for Visualising the mask
def show_image_mask(img, mask, cmap='gray'): # visualisation
fig = plt.figure(figsize=(5,5))
plt.subplot(1, 2, 1)
plt.imshow(img, cmap=cmap)
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(mask, cmap=cmap)
plt.axis('off')
path = '/content/drive/MyDrive/data/train'
class TrainDataset(data.Dataset):
def __init__(self, root=path):
super(TrainDataset, self).__init__()
self.img_files = glob(os.path.join(root,'image','*.png'))
#print(self.img_files)
self.mask_files = []
#print(self.img_files)
for img_path in self.img_files:
basename = os.path.basename(img_path)
self.mask_files.append(os.path.join(root,'mask',basename[:-4]+'_mask.png'))
def __getitem__(self, index):
img_path = self.img_files[index]
mask_path = self.mask_files[index]
data = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
label = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)
return torch.from_numpy(data).float(), torch.from_numpy(label).float() #tensor vlaues for data( image) and label( mask) is returned.
def __len__(self):
return len(self.img_files)
class TestDataset(data.Dataset):
def __init__(self, root=''):
super(TestDataset, self).__init__()
self.img_files = glob(os.path.join(root,'image','*.png'))
def __getitem__(self, index):
img_path = self.img_files[index]
data = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
return torch.from_numpy(data).float(),img_path #Returning the image path for further use
def __len__(self):
return len(self.img_files)
import numpy as np
import os
import cv2
def rle_encoding(x):
'''
*** Credit to https://www.kaggle.com/rakhlin/fast-run-length-encoding-python ***
x: numpy array of shape (height, width), 1 - mask, 0 - background
Returns run length as list
'''
dots = np.where(x.T.flatten() == 1)[0]
run_lengths = []
prev = -2
for b in dots:
if (b > prev + 1): run_lengths.extend((b + 1, 0))
run_lengths[-1] += 1
prev = b
return run_lengths
def submission_converter(mask_directory, path_to_save):
writer = open(os.path.join(path_to_save, "submission.csv"), 'w')
writer.write('id,encoding\n')
files = os.listdir(mask_directory)
for file in files:
name = file[:-4]
mask = cv2.imread(os.path.join(mask_directory, file), cv2.IMREAD_UNCHANGED)
mask1 = (mask == 1)
mask2 = (mask == 2)
mask3 = (mask == 3)
encoded_mask1 = rle_encoding(mask1)
encoded_mask1 = ' '.join(str(e) for e in encoded_mask1)
encoded_mask2 = rle_encoding(mask2)
encoded_mask2 = ' '.join(str(e) for e in encoded_mask2)
encoded_mask3 = rle_encoding(mask3)
encoded_mask3 = ' '.join(str(e) for e in encoded_mask3)
writer.write(name + '1,' + encoded_mask1 + "\n")
writer.write(name + '2,' + encoded_mask2 + "\n")
writer.write(name + '3,' + encoded_mask3 + "\n")
writer.close()
We have created a single function to train and validate the models over the epochs. The function returns a trained model which can be saved and deployed for future predictions on test data. We are also plotting the train loss and validation loss over epochs, providing useful insights about training and validation process.
from torch.utils.data import DataLoader
from collections import defaultdict
import time
import argparse
import torch.nn.functional as F
import torch.nn as nn
import matplotlib.pyplot as plt
import torch.optim as optim
from torch.optim import lr_scheduler
val_data_path = '/content/drive/MyDrive/data/val'
data_path = '/content/drive/MyDrive/data/val'
num_workers = 4
batch_size = 4
epoch_train_losses=[]
epoch_val_losses = []
val_set = TrainDataset(val_data_path) #Change
val_data_loader = DataLoader(dataset=val_set, num_workers=num_workers,batch_size=batch_size, shuffle=False)
train_set = TrainDataset(data_path)
training_data_loader = DataLoader(dataset=train_set, num_workers=num_workers, batch_size=batch_size, shuffle=True)
def plot_losses(epoch_train_losses, epoch_val_losses):
plt.plot(epoch_train_losses, 'b-',label='Taining Losses')
plt.plot(epoch_val_losses, color='orange',label='Validation Losses')
plt.xlabel('Epochs')
plt.ylabel('Error')
plt.legend()
plt.show()
def train_and_validate(model,
device,
optimizer,
epochs: int =10,
batch_size: int = 1,
learning_rate: float = 3e-4,
criterion = nn.CrossEntropyLoss(),
):
epoch_train_losses.clear() # Here we clear the list for any previous run losses stored to get correct new list and graph
epoch_val_losses.clear()
if isinstance(criterion,torch.nn.Module): ##just to be sure that criterion is on the correct device.
criterion.to(device)
for epoch in range(epochs) :
#torch.set_grad_enabled(True)
since = time.time()
train_loss = 0.0
model.train()
for iteration, sample in enumerate(training_data_loader):
img, mask = sample
img=img.unsqueeze(1)
img = img.to(device = device)
#https://discuss.pytorch.org/t/only-batches-of-spatial-targets-supported-non-empty-3d-tensors-but-got-targets-of-size-1-1-256-256/49134/18
mask = mask.squeeze(1)
mask = mask.to(device,dtype = torch.long)
out = model(img)
#print(out.shape)
optimizer.zero_grad()
scheduler = lr_scheduler.StepLR(optimizer,step_size=30,gamma=0.1),
#dice_loss = dice_coef_binary_loss(mask,out)
#dice= dice_loss(F.softmax(out, dim=1).float(), F.one_hot(mask, 4).permute(0, 3, 1, 2).float(),multiclass=True)
#loss = dice
# Then write your BACKWARD & OPTIMIZE below
# Note: Compute Loss and Optimize
loss = criterion (out,mask)
loss.backward()
optimizer.step()
train_loss += loss.item()
# print('train loss',loss.item(),' step :',global_step,' epoch :',epoch)
val_loss =0.0
model.eval()
for image,mask in val_data_loader:
#print(image.shape,mask.shape)
image = image.to(device)
image = image.unsqueeze(1)
mask = mask.to(device,dtype = torch.long)
out = model(image)
loss = criterion(out,mask.long())
val_loss += loss.item() * image.size(0)
out_np = torch.max(out,1).indices.cpu().detach().numpy()
epoch_train_losses.append(train_loss / len(train_set))
epoch_val_losses.append(val_loss/len(val_set))
print(f'Epoch {epoch+1} \t\t Training Loss: {train_loss / len(train_set)} \t\t Validation Loss: {val_loss / len(val_set)}')
plot_losses(epoch_train_losses, epoch_val_losses)
return model
The UNET_Modified1 model was trained on the data. The loss function selected was CrossEntropyLoss with RMSProp optimizer. This model was primitive version of UNET and thus caused oscillations in the training loss, as observed in the graph below.
The above graph (Loss vs Step), indicates poor performance. To overcome this unstability we constructed another model with BatchNormalisation layer between each Conv2d and RELU layer within encoder and decoder blocks. BatchNormalisation supposedly increased the stability of the model. The resulting UNET architecture obtained was called as UNET_Modified2. Initially, we trained this model for 10 epochs. The loss function was CrossEntropyloss with RMSProp Optimiser. Despite a significant decrease in the training loss and strengthening of the model, oscillations persisted. To solve this, we tried exploring other optimizer function and selected 'Adam' [9]. The model was retrained for 25 epochs and observed following training loss for each step:
The oscillations were dampened. This was a strong evidence to go ahead with the Adam optimiser and CrossEntropyLoss.
Further, we merged the training and validation code under a single function. For each epoch, the model was put into trained mode and trained on the training dataset and immediately validated in eval mode using the validation dataset. This helped us analyse the behaviour of the model and its learning and validation for every epoch.
The performance of the UNET_Modified2 model with the RMSProp optimiser, learning rate tuned to 0.000001 and CrossEntropyLoss can be visualised as follows :
We can observe that, the gap between the training loss and validation loss is big and non-convergent. This justifies our choice of Adam over RMSProp as the optimizer function. Visualisation of the Training and Validation loss on UNET_Modified2 model with Adam optimiser and CrossEntropyLoss function. The experimental learning rate is selected as 3e-4 is as follows :
It can be clearly seen that the gap between training and validation loss closes in as the number of epochs increase. We select 30 epochs as optimal to avoid any overfitting of the training data. It was observed that the some of the test data predicitons were poor. The submission score in kaggle for our predicitons using UNET_Modified2 was around 69%. To improve our results, we switched to transfer learning.
Under the transfer learning approach we explored and decided to use segmentation_models_pytorch. The library, contains different encoders based on UNET modifications with pretrained weights on huge datasets. Initially, RESNET50 Encoder was selected and trained on our dataset. It was observed that the predictions were better than the previous experiments. After kaggle submission we achieved a score of 72%. The train loss vs validation loss is as follows:
We infered that the RESNET50 has high number of parameters which may be overwhemling for our small dataset. RESNET34 encoder was selected next as it had fewer parameters. It was seen that the prediction was poor as compared to our previous experiments and got a Kaggle score of 68%.
The models experimented so far can be compared as :
UNET_MODIFIED1 < UNET+ RESNET34 < UNET_MODIFIED2 < UNET + RESNET50.
To further improve prediction score, new transfer learning models [13] which are lightweight and be able to provide state of the art segmentation results were explored; which led to Efficient Net Models. We used EfficientNet B4 encoder (because of its balance between number of parameters and accuracy) model to do the transfer learning and train on our data. To increase the performance of the training process we performed data augmentation to increase the training dataset and a better convergence was observed After training we obtained The graph of Train vs validation loss over epoch as shown below, and we observe good convergence : With increasing epochs, validation loss approaches training loss.
It can be infered that the model formulated is satisfactory and 30 epochs are sufficient for the model to learn about the important features for the given image segmentation task. With this model we were able to achieve a good score of 89% on the Kaggle competition.
For training our models until now, we have been using the standard CrossEntropy loss provided by pytorch. However, the loss mostly became stagnant after a certain number of epochs and the training wasn't proceeding any further. Different optimizers were also used such as Adam, RMSprop, SGD, Adagrad etc. with different learning rates and other hyperparameters but there seemed to be no effect on the training loss. On exploring other loss functions for segmentation tasks, we came across another Loss function - DiceLoss - from the same Segmentation library where we got our pretrained model. Using the log_loss (-log(dice_coeff)) mode of the multiclass dice loss, we were able to learn and reduce the loss even more when we started training on the best model trained using CrossEntropy loss. One other reason for choosing this loss function is that the final evaluation is going to be done using dice loss [4].
#Training with Final UNET model with Optimiser as RMS PROP, passed directly as arguments.
lr_rate = 0.000001
batch_size = 4
model = model_UNET2
learned_model = train_and_validate(model ,device, optim.RMSprop(model.parameters(), lr=lr_rate, weight_decay=1e-8, momentum=0.9), 15, batch_size, learning_rate=lr_rate)
Observation: There was no convergence occuring when we are using RMS prop optimiser. We decided to use different optimiser function
lr_rate = 3e-4
batch_size = 4
model = model_UNET2
learned_model = train_and_validate(model ,device, optim.Adam(model.parameters(),lr = lr_rate), 30, batch_size, learning_rate=lr_rate)
#In this block you are expected to write code to load saved model and deploy it to all data in test set to
# produce segmentation masks in png images valued 0,1,2,3, which will be used for the submission to Kaggle.
import re
from torchvision.utils import save_image
data_path = '/content/drive/MyDrive/data/test/'
num_workers = 2
batch_size = 1
Learned_model= model
test_set = TestDataset(data_path)
test_data_loader = DataLoader(dataset=test_set, num_workers=num_workers,batch_size=batch_size, shuffle=False)
Learned_model.eval()
for iteration, sample in enumerate(test_data_loader):
img,img_pth = sample
#print(img.shape)
print(img.size())
img = img.unsqueeze(1)
print(img.size())
img = img.to(device)
out = Learned_model(img)
out_np = torch.max(out,1).indices.cpu().detach().numpy()
image_np = img.cpu().detach().numpy()
print(img_pth[0])
# print(re.split(r'\/',img_pth[0]))
#img_pth = re.split(r'\/',img_pth[0])
#print(img_pth)
#path = os.path.join('',img_pth[-1])
#print(path)
#path=path[:6]
#print(path)
#cv2.imwrite('/mask/' + path, out)
dir_to_save = '/content/mask_UNET/'
for i in range(1):
show_image_mask(image_np[i,0], out_np[i], cmap='gray')
plt.pause(1)
image_name = img_pth[i].split('/')[-1].split('.')[0]+'_mask.png'
#print(image_name,os.path.join(output_test_mask_path,image_name))
cv2.imwrite(os.path.join(dir_to_save,image_name),out_np[0])
#In this block you are expected to write code to load saved model and deploy it to all data in test set to
# produce segmentation masks in png images valued 0,1,2,3, which will be used for the submission to Kaggle.
import segmentation_models_pytorch as smp
model_resnet50 = smp.Unet(
encoder_name="resnet50", # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
encoder_weights="imagenet", # use `imagenet` pre-trained weights for encoder initialization
in_channels=1, # model input channels (1 for gray-scale images, 3 for RGB, etc.)
classes=4, # model output channels (number of classes in your dataset)
)
import re
from torchvision.utils import save_image
data_path = '/content/drive/MyDrive/data/test/'
num_workers = 2
batch_size = 1
Learned_model= model_resnet50
Learned_model = Learned_model.to(device)
test_set = TestDataset(data_path)
test_data_loader = DataLoader(dataset=test_set, num_workers=num_workers,batch_size=batch_size, shuffle=False)
Learned_model.eval()
for iteration, sample in enumerate(test_data_loader):
img,img_pth = sample
#print(img.shape)
print(img.size())
img = img.unsqueeze(1)
print(img.size())
img = img.to(device)
out = Learned_model(img)
out_np = torch.max(out,1).indices.cpu().detach().numpy()
image_np = img.cpu().detach().numpy()
print(img_pth[0])
# print(re.split(r'\/',img_pth[0]))
#img_pth = re.split(r'\/',img_pth[0])
#print(img_pth)
#path = os.path.join('',img_pth[-1])
#print(path)
#path=path[:6]
#print(path)
#cv2.imwrite('/mask/' + path, out)
dir_to_save = '/content/mask_UNET/'
for i in range(1):
show_image_mask(image_np[i,0], out_np[i], cmap='gray')
plt.pause(1)
image_name = img_pth[i].split('/')[-1].split('.')[0]+'_mask.png'
#print(image_name,os.path.join(output_test_mask_path,image_name))
cv2.imwrite(os.path.join(dir_to_save,image_name),out_np[0])
Observation:- The results were very poor compared to our adapted UNET model. The reason is obvious, it is a general model and not specific to out Dataset.
lr_rate = 3e-4
batch_size = 4
model = smp.Unet(
encoder_name="resnet50", # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
encoder_weights="imagenet", # use `imagenet` pre-trained weights for encoder initialization
in_channels=1, # model input channels (1 for gray-scale images, 3 for RGB, etc.)
classes=4, # model output channels (number of classes in your dataset)
)
model = model.to(device)
learned_model = train_and_validate(model ,device, optim.Adam(model.parameters(),lr = lr_rate), 30, batch_size, learning_rate=lr_rate)
#In this block you are expected to write code to load saved model and deploy it to all data in test set to
# produce segmentation masks in png images valued 0,1,2,3, which will be used for the submission to Kaggle.
import re
from torchvision.utils import save_image
data_path = '/content/drive/MyDrive/data/test/'
num_workers = 2
batch_size = 1
learned_model = learned_model.to(device)
test_set = TestDataset(data_path)
test_data_loader = DataLoader(dataset=test_set, num_workers=num_workers,batch_size=batch_size, shuffle=False)
learned_model.eval()
for iteration, sample in enumerate(test_data_loader):
img,img_pth = sample
#print(img.shape)
print(img.size())
img = img.unsqueeze(1)
print(img.size())
img = img.to(device)
out = learned_model(img)
out_np = torch.max(out,1).indices.cpu().detach().numpy()
image_np = img.cpu().detach().numpy()
print(img_pth[0])
# print(re.split(r'\/',img_pth[0]))
#img_pth = re.split(r'\/',img_pth[0])
#print(img_pth)
#path = os.path.join('',img_pth[-1])
#print(path)
#path=path[:6]
#print(path)
#cv2.imwrite('/mask/' + path, out)
dir_to_save = '/content/mask_UNET/'
for i in range(1):
show_image_mask(image_np[i,0], out_np[i], cmap='gray')
plt.pause(1)
image_name = img_pth[i].split('/')[-1].split('.')[0]+'_mask.png'
#print(image_name,os.path.join(output_test_mask_path,image_name))
cv2.imwrite(os.path.join(dir_to_save,image_name),out_np[0])
Observation:- Predictions were better than the previous experiments. After submission we achieved a score of 72%.
lr_rate = 3e-4
batch_size = 4
model = smp.Unet(
encoder_name="resnet34", # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
encoder_weights="imagenet", # use `imagenet` pre-trained weights for encoder initialization
in_channels=1, # model input channels (1 for gray-scale images, 3 for RGB, etc.)
classes=4, # model output channels (number of classes in your dataset)
)
model = model.to(device)
learned_model = train_and_validate(model ,device, optim.Adam(model.parameters(),lr = lr_rate), 30, batch_size, learning_rate=lr_rate)
#In this block you are expected to write code to load saved model and deploy it to all data in test set to
# produce segmentation masks in png images valued 0,1,2,3, which will be used for the submission to Kaggle.
import re
from torchvision.utils import save_image
data_path = '/content/drive/MyDrive/data/test/'
num_workers = 2
batch_size = 1
learned_model = learned_model.to(device)
test_set = TestDataset(data_path)
test_data_loader = DataLoader(dataset=test_set, num_workers=num_workers,batch_size=batch_size, shuffle=False)
learned_model.eval()
for iteration, sample in enumerate(test_data_loader):
img,img_pth = sample
#print(img.shape)
print(img.size())
img = img.unsqueeze(1)
print(img.size())
img = img.to(device)
out = learned_model(img)
out_np = torch.max(out,1).indices.cpu().detach().numpy()
image_np = img.cpu().detach().numpy()
print(img_pth[0])
# print(re.split(r'\/',img_pth[0]))
#img_pth = re.split(r'\/',img_pth[0])
#print(img_pth)
#path = os.path.join('',img_pth[-1])
#print(path)
#path=path[:6]
#print(path)
#cv2.imwrite('/mask/' + path, out)
dir_to_save = '/content/mask_UNET/'
for i in range(1):
show_image_mask(image_np[i,0], out_np[i], cmap='gray')
plt.pause(1)
image_name = img_pth[i].split('/')[-1].split('.')[0]+'_mask.png'
#print(image_name,os.path.join(output_test_mask_path,image_name))
cv2.imwrite(os.path.join(dir_to_save,image_name),out_np[0])
Visual inspection of Test and its inference:- We observed that the prediciton were good but the kaggle score obtained was only 68%. This puts the model comaprison as follows : RESNET 34 < ADAPTED UNET2 < RESNET 50
To improve the scores we further explore Transfer learning models. Efficient Net B4 is a lightweight model which was capable for solving our purpose.
lr_rate = 3e-4
batch_size = 4
model = smp.Unet(
encoder_name="efficientnet-b4", # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
encoder_weights="imagenet", # use `imagenet` pre-trained weights for encoder initialization
in_channels=1, # model input channels (1 for gray-scale images, 3 for RGB, etc.)
classes=4, # model output channels (number of classes in your dataset)
)
model = model.to(device)
learned_model = train_and_validate(model ,device, optim.Adam(model.parameters(),lr = lr_rate), 30, batch_size, learning_rate=lr_rate)
testing the above model
#In this block you are expected to write code to load saved model and deploy it to all data in test set to
# produce segmentation masks in png images valued 0,1,2,3, which will be used for the submission to Kaggle.
import re
from torchvision.utils import save_image
data_path = '/content/drive/MyDrive/data/test/'
num_workers = 2
batch_size = 1
learned_model = learned_model.to(device)
test_set = TestDataset(data_path)
test_data_loader = DataLoader(dataset=test_set, num_workers=num_workers,batch_size=batch_size, shuffle=False)
learned_model.eval()
for iteration, sample in enumerate(test_data_loader):
img,img_pth = sample
#print(img.shape)
print(img.size())
img = img.unsqueeze(1)
print(img.size())
img = img.to(device)
out = learned_model(img)
out_np = torch.max(out,1).indices.cpu().detach().numpy()
image_np = img.cpu().detach().numpy()
print(img_pth[0])
# print(re.split(r'\/',img_pth[0]))
#img_pth = re.split(r'\/',img_pth[0])
#print(img_pth)
#path = os.path.join('',img_pth[-1])
#print(path)
#path=path[:6]
#print(path)
#cv2.imwrite('/mask/' + path, out)
dir_to_save = '/content/mask_EFFNET/'
for i in range(1):
show_image_mask(image_np[i,0], out_np[i], cmap='gray')
plt.pause(1)
image_name = img_pth[i].split('/')[-1].split('.')[0]+'_mask.png'
#print(image_name,os.path.join(output_test_mask_path,image_name))
cv2.imwrite(os.path.join(dir_to_save,image_name),out_np[0])
For training our models until now, we have been using the standard CrossEntropy loss provided by pytorch. However, the loss mostly became stagnant after a certain number of epochs and the training wasn't proceeding any further. Different optimizers were also used such as Adam, RMSprop, SGD, Adagrad etc. with different learning rates and other hyperparameters but there seemed to be no effect on the training loss. On exploring other loss functions for segmentation tasks, we came across another Loss function - DiceLoss - from the same Segmentation library where we got our pretrained model. Using the log_loss (-log(dice_coeff)) mode of the multiclass dice loss, we were able to learn and reduce the loss even more when we started training on the best model trained using CrossEntropy loss. One other reason for choosing this loss function is that the final evaluation is going to be done using dice loss.
For the final submission, based on the above experimentaion and observation we take into consideration the following points:
To conclude we observed that transfer learning approach with EfficientNet-b4 with pretrained weights from ImageNet produced the best model for our problem with very limited amount of dataset. Our strategy of first making a custom model from scratch gave us a very good intuition on how data and model performed which led us to do a lot of experiments and this ultimately led us to select the best model.
import segmentation_models_pytorch as smp
model = smp.Unet(
encoder_name="efficientnet-b4", # choose encoder,
encoder_weights="imagenet", # use `imagenet` pre-trained weights for encoder initialization
in_channels=1, # model input channels (1 for gray-scale images, 3 for RGB, etc.)
classes=4, # model output channels (number of classes in your dataset)
)
import time
model.to(device)
data_path = '/content/drive/MyDrive/data/train/'
num_workers = 4
batch_size = 5
train_set = TrainDatasetModified(data_path) #Using augmented training set
training_data_loader = DataLoader(dataset=train_set, num_workers=num_workers, batch_size=batch_size, shuffle=True)
#loss_fn = torch.nn.CrossEntropyLoss()
loss_fn = smp.losses.DiceLoss('multiclass', classes=None, log_loss=True, from_logits=True, smooth=0.0, ignore_index=None, eps=1e-07) #using multiclass dice loss instead of cross entropy
train_loss=[]
model.train()
#opt_sgd = torch.optim.SGD(model.parameters(), lr=0.0001, momentum=0.9)
opt = torch.optim.Adam(model.parameters(), lr=0.0008)
for epoch in range(60): # loop over the dataset
running_loss = 0.0
for image,mask in training_data_loader:
image.to(device)
mask.to(device)
image = image.to(device)
mask = mask.to(device)
out = model(image)
loss = loss_fn(out,mask.long())
opt.zero_grad()
loss.backward()
opt.step()
running_loss += (loss.item()/batch_size)
train_loss.append(running_loss)
print('Train loss for epoch:',epoch,' : ',train_loss)
if epoch%10 == 0:
torch.save(model,'u-effb4_chckpnt.pth')
torch.save(model,'u-effb4.pth')
print('Finished Training. model saved')
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"The current device is {device}")
#model=torch.load('unet-res50_opt_3.pth')
#model.to(device)
loss_fn = torch.nn.CrossEntropyLoss()
num_workers = 4
batch_size = 1
data_path='/content/drive/MyDrive/data/val/'
val_set = ValDatasetModified(data_path)
training_data_loader = DataLoader(dataset=val_set, num_workers=num_workers, batch_size=batch_size, shuffle=True)
print('')
model.eval()
total_val_score=0
for image,mask in training_data_loader:
#print(image.shape,mask.shape)
image = image.to(device)
mask = mask.to(device)
out = model(image)
loss = loss_fn(out,mask.long())
print('CrossEntropy loss score for image : ',loss.item())
out_np = torch.max(out,1).indices.cpu().detach().numpy()
mask_np = mask.cpu().detach().numpy()
image_np = image.cpu().detach().numpy()
#print(out_np[0].shape,mask_np.shape)
d_score=0
for j in range(1,4):
d_score += categorical_dice(mask_np[0], out_np[0],j)
total_val_score += d_score/3
for i in range(1):
show_image_mask_mask(image_np[i,0],mask_np[i], out_np[i], cmap='gray')
plt.pause(1)
print("Val_score : ",total_val_score/20)
# Load final model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"The current device is {device}")
model=torch.load('u-effb4_100.pth')
model.to(device)
model.eval()
input_test_data_path = '/content/drive/MyDrive/data/test/'
output_test_mask_path='/content/mask_EFFNET/'
num_workers = 4
batch_size = 1
test_set = TestDatasetModified(input_test_data_path)
test_data_loader = DataLoader(dataset=test_set, num_workers=num_workers,batch_size=batch_size, shuffle=False)
for image,img_paths in test_data_loader:
image = image.to(device)
out = model(image)
out_np = torch.max(out,1).indices.cpu().detach().numpy()
image_np = image.cpu().detach().numpy()
for i in range(batch_size):
show_image_mask(image_np[i,0],out_np[i], cmap='gray')
plt.pause(1)
image_name = img_paths[i].split('/')[-1].split('.')[0]+'_mask.png'
#print(image_name,os.path.join(output_test_mask_path,image_name))
cv2.imwrite(os.path.join(output_test_mask_path,image_name),out_np[0])
import numpy as np
import os
import cv2
def rle_encoding(x):
'''
*** Credit to https://www.kaggle.com/rakhlin/fast-run-length-encoding-python ***
x: numpy array of shape (height, width), 1 - mask, 0 - background
Returns run length as list
'''
dots = np.where(x.T.flatten() == 1)[0]
run_lengths = []
prev = -2
for b in dots:
if (b > prev + 1): run_lengths.extend((b + 1, 0))
run_lengths[-1] += 1
prev = b
return run_lengths
def submission_converter(mask_directory, path_to_save):
writer = open(os.path.join(path_to_save, "submission.csv"), 'w')
writer.write('id,encoding\n')
files = os.listdir(mask_directory)
for file in files:
name = file[:-4]
mask = cv2.imread(os.path.join(mask_directory, file), cv2.IMREAD_UNCHANGED)
mask1 = (mask == 1)
mask2 = (mask == 2)
mask3 = (mask == 3)
encoded_mask1 = rle_encoding(mask1)
encoded_mask1 = ' '.join(str(e) for e in encoded_mask1)
encoded_mask2 = rle_encoding(mask2)
encoded_mask2 = ' '.join(str(e) for e in encoded_mask2)
encoded_mask3 = rle_encoding(mask3)
encoded_mask3 = ' '.join(str(e) for e in encoded_mask3)
writer.write(name + '1,' + encoded_mask1 + "\n")
writer.write(name + '2,' + encoded_mask2 + "\n")
writer.write(name + '3,' + encoded_mask3 + "\n")
writer.close()
submission_converter('/content/mask_EFFNET/','/content/Submission')
URL: https://smp.readthedocs.io/en/latest/losses.html#diceloss accessed on Nov 12,2021
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. https://doi.org/10.3390/info11020125
Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (June 2017), 84–90. DOI:https://doi.org/10.1145/3065386
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1097-1105.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning (pp. 6105-6114). PMLR.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
URL: https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47 accessed on Nov 15,2021
URL: https://towardsdatascience.com/4-pre-trained-cnn-models-to-use-for-computer-vision-with-transfer-learning-885cb1b2dfc accessed on Nov 26,2021